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Abstract — Extremes of information combining inequalities play 
an important role in the analysis of sparse-graph codes un- 
der message-passing decoding. We introduce new tools for the 
derivation of such inequalities, and show by means of a concrete 
examples how they can be applied to solve some optimization 
problems in the analysis of low-density parity-check codes. 

I. Setting 

In order to understand iterative decoding of low-density- 
parity-check codes (LDPC), two operations need to be studied. 
These operations are the variable node convolution (g) and the 
check node convolution Kl. They correspond to the merging 
of information respectively by variable nodes and by check 
nodes in the iterative decoding process. The reader is assumed 
to be familiar with LDPC codes as well as the formalism of 
modeling channels by densities. A very complete introduction 
to the topic is (TJ. 

The notion of extremes of information combining (EIC) was 
introduced by I. Land, P. Hoeher, S. Huettinger, and J. B. 
Huber in Q, and further extended by I. Sutskover, S. Shamai, 
and J. Ziv, see El or Q. The idea of EIC is to associate to 
densities certain functionals, e.g. the entropy functional, and 
to see how these functionals behave under the combining of 
information, i.e. the two kinds of convolutions. The purpose 
of this work is to solve optimizing problems that arise in 
this setting. We will focus solely on the check convolution 
Kl although many statements can be proven in the same way 
for the variable node convolution. 

A. Notations 

There are several representations for a binary memoryless 
and symmetric-output channel (BMS). As is done for instance 
in (TJ, we see a BMS as a convex combination of binary 
symmetric channels (BSC), given by a weight distribution w. 
Then we have (by definition) 

Example 1 (Binary Symmetric Channel BSC(e)). 

W B SC(e) = 5 C 

Example 2 (Binary Erasure Channel BEC(e)). 

wbec( £ ) = e<$o + (5i 



The functionals of interest in this domain are 

i 

E(a) = / dw a (e)e 
Jo 

H(a)= [ 2 dw a (e)h 2 (e) 
Jo 

B(a) = ( 2 dw a (e)2^e{l-e) 
Jo 

which we call respectively the error probability, the entropy 
and the Battacharyya functional. These can all be thought of 
as measures of the channel quality. They are equal to for the 
perfect channel and equal to 1 for a useless channel. Applying 
these functionals to the check convolution of two densities 
corresponds to 

E { am= l - {l - 2E{ f {l - 2m) 

H(aMb) = J dw a {e)dw b {e')h 2 Q ~ (1 ~ ^ ~ ^ 

B{aMb) = J dw a (e)dw b (e')\Jl- ((1 - 2e)(l - 2e')) 2 

In the sequel we will frequently refer to the following two 
functions, and fg : 

/„:Ie[0;l]Hfe(^) 

f B ■ X G [0; 1] i ^ Vl-X 2 

B. Motivation 

A classical result in EIC, shown in [2] and |3|, is the 
following. 

Theorem 1.1. Let bo be any BMS channel. Amongst channels 
a, with fixed entropy H, H(aMbo) is 

• minimized by the BSCfJi^ (/i)) and 

• maximized by the BEC(h). 

A quick and useful application of II. II is to give bounds on 
the thresholds of LDPC codes. The same statement can be 
done with the Battacharyya functional B. We will derive an 
alternate (calculus free) proof of the second item in [TV] 

Sometimes one might need to deal with non-linear ex- 
pressions such as H(a^ 4 ) — H(aP 12 ). Let us sketch very 
loosely, following [4 |, how such expressions can appear. Apart 
from the Shannon threshold another threshold called the Area 
Threshold can be defined. The Area threshold depends on 



the code and channel family under consideration. In the case 
of a code taken from the (di,d r ) regular ensemble, one can 
compute this threshold h . 

Consider a code taken from the (di,d r ) regular ensemble, 
and transmission over a "gentle" channel family jcer}^ that 
is a family that is smooth, ordered, and completqj. Ordered 
means that the bigger the channel parameter a the worst the 
channel is, in other words, all the functionals introduced above 
increase with a, "smooth" means we can derivate a in the 
integrals. 

Then one can define a GEXIT curve in the following 
manner. Take a FP (c CT , x) and define y = x Eldr ~ 1 . Then plot, 

(H(c a ),G(c a ,y® d >)). 

Here G(c a , •) = — gffe In the case of the BEC, changing 
the channel parameter er, corresponds to revealing certain bits, 
and the kernel G(c a , •) represents the probability that this bit 
was not previously known from the observation of the value 
of other neighboring bitfl In general everything has the same 
meaning but with soft information. 

The kernel models how much more (compared to if we 
use only extrinsic observations) information is known about a 
generic bit, if the channel is made slightly better. For instance 
if the channel changes from being useless (H(c) = 1) to 
slightly better, all the information we get is useful because 
with a useless channel nothing is known. So there is a point 
at (1,1). 

So intuitively, the area below this curve (assuming it exists 
and is smooth) between h and 1, should be a measure in bits of 
the total useful information that we get through BP decoding 
for a s.t. H(c a ) = h. As the rate of the code is roughly 
1 — 4 s -, 1 — 4 s - bits of information is enough to fully determine 
a codeword. 

It is then natural to define the Area threshold h A as the point 
on the horizontal axis s.t. the area below the curve starting at 
h A to 1 is equal to the design rate 1 — 4*-. Of course this 
notion is dependent on the channel family. 

However, an iterative decoder like BP might not be able to 
"use" all this informatiorH So in general h BP < h A . 

It would make sense that h MAP — h A , although in the 
general setting all that is known is h MAP < h A 

In J4j, it is shown that the value of the integral from h to 
1 is 

1 - - h- (d, - 1 - ^)H(x m >-) + (d, - l)H(x m ^) 

(1) 

'The definitions of these terms can be found for instance in [T]. Examples 
of such families include amongst many others the {BEC(/i)}q and the 
{BSC(/i) ( 1 ) as well as combinations of these two, and other classical families 
like the {BAWGNC(cr)}g° . 

2 Neighbors is to be understood in the sense of the Tanner graph as usual. 

3 Think of the BEC for which what BP does, is solving a system of equations 
by iteratively solving equations where all variables are known but one. Even if 
the system is full rank there might still be large portions that remain unknown 
to BP. 



where x is "the" BP fixed point with entropy h for the channel 
family under consideration. 

The value of h A , turns out to be the right bound of the 
domain where the following holds 

-h - (d, - 1 - ^-)H(x m -) + (di - l)H{x m ^ x ) > 0. 

(2) 

Here, x is "the" density evolution fixed point with entropy 
h, using belief propagation (BP) decoding. In |4) it is shown 
that indeed (|2]i holds, universally over all BMS channels x 
with entropy lower or equal to 4*-, in the asymptotic regime 
di,d r — > oo with 4 1 - fixed. This implies the Area threshold 
universally approaches the Shannon threshold. We will derive 
another proof of this fact in Section [V] (see Proposition IV.2I ). 

In J4) it is then shown that a class of spatially coupled codes 
achieve the Area threshold, under BP decoding. This combined 
with the fact above, gives a new way to achieve capacity. 

II. Results 

Our results fit in a slightly more general framework than 
that of Theorem II. 11 we will consider expressions of the type 
Q(p(a)) where p is a polynomial, and $ is either H or B. We 
use the following notation 

Notation. Let p(X) — CiX 1 be any polynomial s.t. p(0) = 
00 i.e. Co = and $ be one of the functionals above. We will 
use the convention 

*(K«)) = E c ^( aa ) w 

i 

The following two statements are our main results. We prove 
them in the next section. 

Proposition II.l. Let p be any polynomial s.t. p(0) — 0, and 
$ be H or B. Consider the following problem 

MAX $(p(o)) 
s.t. $(a) = </>o 

Then, if p is U-convex over [0; 1 (</'o) 2 ]. me BEC solves this 
problem. 

Proposition II.2. Let p be any polynomial s.t. p(0) — 0, and 
$ be H or B. Consider the following problem 

OPT $(p(a)) 
s.t. E(a) = e 

Then, if p is increasing over [0; 1 — 2e], 

• the BEC minimizes this problem. 

• the BSC maximizes this problem. 

Discussion. The hypotheses for these propositions are proba- 
bly not tight, they just ease the proofs. The reader should not 
pay too much attention to the obscure terms /$ 1 (0o) 2 - 

4 Instead of considering polynomials who vanish at 0, we could use a 
convention like ="Perfect Channel". 



The maximizing part in the previous result |TT] follow as a Example 3. Fix $ = <j) , where $ is either the Battacharyya 

special case of Proposition III. II with p = X d . Our improve- functional or the entropy. Consider the BEC and the BSC s.t. 

ment, technically speaking, is dealing with other polynomials there $(.) is equal to cj>Q, Then, 

than X d . , 

7BEC,n = 1—00 

Proposition III. II only addresses half of the question. We Jbsc n — f 1 (4>o) 2n 
suspect that in most cases the minimizer is the BSC, and pose 

this as an interesting open question. Dealing with the problem With this definition 

12 requires that we have a lower bound. This is the purpouse i _ $(a) = a$ „7 a n (5) 

of the following lemma n>1 

Lemma II.3. Suppose p is increasing over [0; /$ 1 (</>o) 2 ]- Note a l so tnat if $ = H, then 1 - $ is no other than 

Then, for all channels a with $(a) = </> , C, the capacity functional. Also, using Fubini, we see that 



<S>(p(a))>p(l)-p(f^(fo) 2 ) (4) 



III. Proofs 



/ dw a (e)dw b (e')(l - 2e) 2 "(l - 2e') 2n = 7q ,„ 7m and it 
follows that 

1 - <fr(q M b) = q$, n 7a,n7b,n (6) 

Before we start the proof, a few preliminary observations and Ms y ields straightforwardly 
are needed - 1 - ®(a m ) = ]T a, in 7* iB (7) 

A. Preliminary observations More generally, if p = Y^>i is a polynomial 

Let $ be either H, the entropy or B, the Battacharyya ^ 
functional. In both cases the "kernel" /$ can be expanded $(p(a)) = ^^Ci$(a a ) 

in power series, 



which can be rewritten as 



where equality still holds for X = 1. The crucial property of = Ci ~ a <s>,n c i7a,n 

(a*,n) n is that all the terms are positive and furthermore 

a$,„ = 1 

n>0 

The explicit formulas are $ ^ fl )) = ~ E a *,«K7a,n) (8) 

n 

a „ n — i Although very simple, the expansion above gives an efficient 

2 log(2)n(2n — 1) wav j- ^ STrve numerous bounds. All the proofs presented here 

rely heavily on it. 

(2n — 1)4" It will be convenient in the sequel to know the range the 

moments can achieve. They are decreasing and positive. So 
This expansion can be plugged in the definition of 9(a) to , . . . . c . ™ , , 

r r bb w the biggest moment is the first 7 a x. The next lemma states 

^' e ^ d what channel a maximizes "f a ,i- 

1 — $(a) = 1—1 chz; a (e)/$(l — 2e) Lemma III.2. Amongst all channels a, s.t. $(a) = 0o, the 

BSC maximizes j a ,i- 

— a$,„ J dw a (e)(l 2e) Proof: The function x \-> x n is U-convex. Using Jensen's 

inequality 



f2n 

a B 



and we can proceed in a similar fashion for $(a§ b) or more 
complicated expressions. 

Definition III.l (moments). For a channel a, its n-th moment 
is defined by 



2n 



7a,„ = J dt»„(e)(l - 2ef n > ( J dw a (e)(l - 2e) 2 ) = 7^ 
Then notice 

1 - 4>0 = a-S>,nJa,n > a *,»7",l 

= i - /*(Vtm) 

We call the 7 a ,„s moments even if, strictly speaking, they are inverting this inequality - f^ 1 is decreasing because U is - 
not. Note that in terms of moments, the BEC is characterized gives 
by having all its moments equal, and the BSC by having 

moments that decrease geometrically. T<M — 



7a, n = / dw a (e)(l - 2e) 



The bound is attained by and only by the BSC, for which 
indeed 



7bsc,i 



(9) 



Notation. We may write 71 instead of 7bsc.i- 

Bounds can be used at two different levels. Either we bound 
the moments themselves - like in the derivation of III. 31 - that 
would be the first level. Or we can look at the expressions 
from one step further and see a *,«7a,n as an expectation 
£'(7). Here the expectation is taken w.r.t to a discrete measure 
given by the weights (a$ „). In this second setup, we can then 
use classical inequalities, like the Jensen inequality. That is the 
idea of the proof of III. II 



B. Proof of ULL 

Notice, by assumption and Lemma |H31 the range over which 
p is convex covers the values the moments can take. 

$(/>(°)) ^ Pi 1 ) ~ a *,nf (7a,n) 

n 

Jensen v 

< p(l)- p( } y a$,n7a,n) 

n 

= p(i)-p(i-W 

To conclude notice that p(l) - p(l - (j> ) = $(p(BEC(0 o ))). 



C. Proof ofUL2\ 

Proposition III. 21 is a direct corollary of 

Lemma III.3. For all n € N, amongst the channels a with 
fixed E say e, the one who minimizes (resp. maximizes) 7 a .„ 
is the BSC(e) (resp BEC(2e)). 



Proof of \III.3\ Even though it is not mandatory to do so, 
we can choose according to Caratheodory Prinicple (see IVIl i 
to restrict ourselves to combinations of two 5's. 



a = aS ei + a5 e2 



Then using the U-convexity of e >-> (1 — 2e) 2n we have 
1 - 2e > a(l - 2e l ) 2n + a(l - 2e 2 ) 2 " > (1 - 2e) 2n 



7a, r, 



7BSC,ti 



The polynomial p is supposed to be increasing over [0; 1 — 
2e], that is over a range that covers all the values the moments 
can take. Using this and IIII.3I the optimizers to each term in 

the series expansion of ${p(a)) = p(l) - J2 n a <s>,nPha.n) 
are the same, so we know they are the global optimizers. 



D. Proof of \HJi 

Proof: We simply use j an < 7 a 1 and the monotonicity 
of p to get 

Vnp(7 Qi „) < p{y a ,i) 



Then 



rg| 

*(p(a)) = P(!) - a<S,, n p(la,n) > - P{ja,l) ^ 

n n 

1 

and using Lemma III. 31 and again the monotonicity of p 

P(la,l) < P(7BSC,l) = P(Z$ 1 {M 2 ) 

© follows. ■ 

IV. Other Inequalities 

Here we give other inequalities that can be deri ved u sing 
the p ower series expansion, just as in the proofs of ( III. II ) and 
dTJ.21 >. We will only prove (flOb along with the equality case 
which is the second part of ( II. U . Remember that $ stands for 
either H or B. The reals a and /3 sum to 1. 



1 - $(aB b) > (1 - $(a))(l - $(&)) 



$(a Hd ) < $(a)(d-$(a) 



Ma 



Ma 



1 - Ma) < \f\ - MaWaj 



(10) 
(11) 
(12) 



1 — $(a Mb) < \/l — $(a i ajy/l - $(& B &) (13) 
1 - $(a H 6) < v/l - $(a i a i 6)^/1 - $(&) (14) 
$ ((aa + /3&f d ) > + /3$(& Kd ) (15) 

^/l - M(aa + f3b) m ) < a y 7 ! - $(a Hd ) $(6 Hd ) 



< /*(! - 2E(a)) 
1 - $(a H b) < (1 - $(a))(l - 2£(b)) 



(16) 
(17) 
(18) 



Proof: ([Tol l: We do the same as in IIII-BI except using 
another inequality than Jensen. Recall from © that 

1 - $(a El b) = 2J a*,n7a,™7&,n- 
We use the following corollary of FKG inequality 

E(/3) > E(/)E(ff), 

whenever /,g have the same monotonicity. Equality case is 
when / or g is constant a.e. . Here / : n H> 7 a! „, j:ni-> 7^ n 
and E(/) = J] «*,»/«■ 

So, since the moments are decreasing, we get 

1 - $(a Kl b) = Y a^,nla,nlb,n 

— a *,n7a,n ^ a *,n7b,n 

= (l-*(a))(l -*(&)) 
with equality when a or b is from the BEC family. ■ 



V. An application : studying the area threshold 

Remember our initial problem which was to study when © (changing the constant) to be simply 

K 



holds. Fix Co > 0, we would like to know first, when 

di 



(di 



d, 



■)H(a Mdr ) + (di - l)H(a® d >- 1 ) > 



c'o 



holds. We are going to show 



Lemma V.l. If the following two conditions are fulfilled then 
(O holds. 

(i) (l- 2 / l2 - 1 (/ l )) 2 <(^ T )^ T 

(ii) h < f - 2c 



Proof: Define 

d = d r K - 



di 



1 _ A 



1 



X 



d-l 



We are going to use the bound from Lemma III. 31 The 
condition for p to be increasing over the range of interest 

is (l — 2/i^~ 1 (/i)) < which is always true when k is 



given the value k 



di-i- 



di _ 1 — . So by Lemma III. 31 

H(p(a)) > 1 - k - (1 - 2h^ (h)) 2 ^^ + k (1 - 2h^ 1 (h)) 



and then, 



> 



-(di-l- ^)H{a m <-) + (dt - l)H(a m - 
-h+(di-l)H(p(a)) 

-h + (dl - 1) [l - K - p((l - 2h2 1 (^)) 2 )] 



Also (di - 1)(1 - k) = I s -. So for (O to hold it is enough 
that 



h + (di - 1) 1 - « - p((l - 2/i^ 1 (/t)) 2 ) 



> c 



of-h-fr- l)p((l - 2h^(h)f) > co 
which can be rewritten 

^ - h > (di - l)p((l - 2h^(h)) 2 ) +c 

"r v „ ' 

(i) is s.t. !;(h) < cq, and then (ii) makes (ETT i true. Indeed, 



Where if is some constant. Asymptotically this can be taken 
) to be sirr 

h~\h) > 



(19) In the end, we are left with 



Proposition V.2. For, d;, d r large enough, the range for which 
(0 holds contains an interval of the form [L(di, d r ); R(di, d r )] 
where 

K 

L(di,d r ) = h 2 (—7=) 



R(di,d r ) = ^ o(d r cxp(-v^r)) 

Remark. Actually, changing c (di,d r ), we could replace any 
y7 by (.) Q for any a < 1. 

■ 

Proposition III. II in this context can be rephrased as 
Corollary V.3. Define k as above 

K ~ d t -l 

(20) Amongst channels a, with fixed entropy h, assuming the 
following condition is fulfilled 

2 



(1 - 2h^(h)) z < 



(22) 



then the BEC(h) minimizes 

di. 



-h-(di-l- -±)H(x^) + (di - l)H(x^- L ) 
a r 

VI. Convex optimization and the shape of 

EXTREMAL DENSITIES 

Classical convex analysis provides powerful tools that allow 
- at least in the case where the target functional is linear - to 
drastically reduce the range of possible optimizers. Remember 
we represented the channels by probability measures over 
[0; 1]. The basic principle is as follows 

Theorem VI.l (Dual Caratheodory). Take $ any continuous 
linear functional over BMS channels, like all those discussed 
v ' above, and consider the following problem 



OPT $(a) 
s.t ($i(a),...,$ m (a)) = (0i, 



(d l -l-^-)(l-2h^(h)f 



i(h) = (di - 1)(1 - 2h2 1 (h)) 2dr ~ 2 
<(d l -l)(l-2h^ 1 (h)) 2d '- 2 

< w _ 1)( _fyfei =0) 

If we are interested only in the sign of A(h) and not how far 
it is from 0, we can let Co = f(di,d r ) to increase the range of 
valid h. For instance, taking c = (di — 1) exp(— ^/d r — 1), 

d)^h^(h)> JL {1 + o{1)) 



idr 



Then there are extremal densities a + and a_ with support of 
cardinality at most m + 1. 

Discussion. The constraints are also assumed to be linear. A 
more extensive source on the topic is J5|. 

This principle sheds some light on the fact that the BSC 
(which has one mass point in our representation) and the BEC 
(which has two) appear so often as extremal densities, when 
we consider problems with a single constraint. Indeed, one 
constraint corresponds to at most two mass points. 



Extensions of the Caratheodory Principle were amongst the 
tools used in fft) to track two channel parameters (namely H 
and E) through the process of iterative decoding. As a result 
new bounds on iterative decoding were shown. 

It seems hard to derive proofs using solely IVI.ll However it 
can be used to do numerical experiments. One way to proceed 
is as follows. Consider the target functional $(p(a)) where p 
is of degree d. Introduce d variables 01, «2, • • • , a>d and replace 
(for k < d) 



Denote 3>(ai, . . . , a£) the expression we get. If it is max- 
imized by a tuple where all coefficients at are the same, 
then we know the initial expression has the same maximizer. 
To maximize <E>, a simple tractable heuristic is optimizing 
coordinate after coordinate. Starting from random <ZjS, to 
fix each coordinate except coordinate i, then find the best 
combination of two S's for this coordinate. And repeat for all 
i < d. This gave good results for the motivational expression 
of (O and led to the claim 

Claim. The expression in (0, for any h and when (di,d r ) = 
(3, 6) or (5, 10) (the cases we tested) is always minimized by 
the BSC and maximized by the BEC. 
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